Dynamic Tree Cut: in-depth description, tests and applications
نویسندگان
چکیده
In hierarchical clustering, clusters are defined as branches of a cluster tree. The constant height branch cut, a commonly used method to identify branches of a cluster tree, is not ideal for cluster identification in complicated dendrograms. We describe a new dynamic branch cutting approach for detecting clusters in a cluster tree based on their shape. Compared to the constant height cutoff, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible: cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; (4) we find that they work well for finding modules in protein–protein interaction and gene co-expression networks. Additionally, our methods can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We describe the Dynamic Tree Cut algorithms in detail and give examples illustrating their use. The Dynamic Tree Cut package and example scripts, all implemented in R language, can be downloaded from http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting. 1 Why Dynamic Tree Cut? Hierarchical clustering is a popular data mining method for detecting clusters of closely-related objects in data [7]; a major application in bioinformatics is clustering of gene expression profiles. Hierarchical clustering organizes objects into a hierarchical cluster tree (dendrogram) whose branches are the desired clusters. The process of identifying individual branches is variously referred to as branch or tree cutting or dendrogram pruning. The most widely used tree cut method is the fixed height branch cut: the user chooses a fixed height on the dendrogram, and each contiguous branch of objects below that height is considered a separate cluster. When detecting gene clusters (also referred to as modules), one typically also requires each cluster to have size at least N0, a chosen constant number. The fixed height branch cut is a simple and elegant technique with many desirable properties, but it is not ideal in situations where one expects a complicated dendrogram structure with nested clusters. Examples of such situations are described in Section 4.2. It should be noted that this is not a fault of hierarchical clustering as such: often the dendrogram exhibits distinct branches corresponding to the desired modules, but no single fixed cut height can identify them correctly. To automatically detect the clusters, the tree cut method should identify branches based on their shape, not on absolute height. For the benefit of the reader, we now briefly review hierarchical clustering and the structure of the resulting dendrograms; for more details we refer the reader to any of a large number of textbooks such as [4]. Hierarchical clustering is a class of agglomerative clustering techniques that iteratively merge two closest objects into a new composite object (cluster). Composite objects are further merged with other (original or composite) objects until all objects have been merged. Hierarchical clustering methods differ primarily on how the dissimilarity ∗Joint First Authors. †to whom correspondence should be addressed
منابع مشابه
Dynamic Subtrees Queries Revisited: The Depth First Tour Tree
In the dynamic tree problem the goal is the maintenance of an arbitrary n-vertex forest, where the trees are subject to joining and splitting by, respectively, adding and removing edges. Depending on the application, information can be associated to nodes or edges (or both), and queries might require to combine values in path or (sub)trees. In this paper we present a novel data structure, calle...
متن کاملDynamic Maximum Tree Depth A Simple Technique for Avoiding Bloat in Tree-Based GP
We present a technique, designated as dynamic maximum tree depth, for avoiding excessive growth of tree-based GP individuals during the evolutionary process. This technique introduces a dynamic tree depth limit, very similar to the Koza-style strict limit except in two aspects: it is initially set with a low value; it is increased when needed to accommodate an individual that is deeper than the...
متن کاملDynamic Gomory-Hu Tree Construction - fast and simple
A cut tree (or Gomory-Hu tree) of an undirected weighted graph G = (V,E) encodes a minimum s-t-cut for each vertex pair {s, t} ⊆ V and can be iteratively constructed by n − 1 maximum flow computations. They solve the multiterminal network flow problem, which asks for the all-pairs maximum flow values in a network and at the same time they represent n− 1 non-crossing, linearly independent cuts t...
متن کاملFinding optimal satisficing strategies for and-or trees
Many tasks require evaluating a specified Boolean expression over a set of probabilistic tests whose costs and success probabilities are each known. A strategy specifies when to perform which test, towards determining the overall outcome of . We are interested in finding the strategy with the minimum expected cost. As this task is typically NP-hard — for example, when tests can occur many times...
متن کاملEvaluation of Dynamic Probing Testing Effect in Hand Excavated Pit on Test Results Using Numerical Modeling
In Iran, using the hand excavated pits (wells) have been more common compared to other countries. As a matter of fact, recent years, utilizing the dynamic probing test (DPT) in these types of pits has been significantly developed in Iran. This is while the standard state of doing this test is from the ground level. In this work, the dynamic probing test is carried out in two similar wells with ...
متن کامل